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Abstract 

In this paper we propose and advocate the use of the so called Levy flights as a driving 
mechanism for a class of stochastic optimization computations. This proposal, for some reasons 
overlooked until now, is - in author's opinion ~ very appropriate to satisfy the need for algorithm, 
which is capable of generating trial steps of very different length in the search space. The required 
balance between short and long steps can be easily and fully controlled. A simple example of 
approximated Levy distribution, implemented in FORTRAN 77, is given. We also discuss the 
physical grounds of presented methods. 
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I. Optimization algorithms and physics 

More and more global optimization tasks are completed today using algorithms origi- 
nating from mimicking the way the Nature solves them. We have two branches of science, 
which describe the world around us, namely biology (for living things) and physics (for 
the rest of it). The problem of global optimization (either minimization or maximiza- 
tion, with or without constraints) of the objective function of many variables still remains 
a challenge for practitioners. There is no single, universal and deterministic algorithm 
capable of solving all kinds of optimization problems: those involving smooth as well as 
non-smooth objective functions, mixed integer-real-boolean valued unknowns, etc. 

Classical mathematical analysis was long the only tool for finding extrema of functions 
of many variables. Unfortunately it is of very limited use in many practical applications, 
especially when the objective function is not different iable at least once. On the other 
hand, we are ready to accept the solutions, which are not perfect in the mathematical 
sense (often called "^-optimal"), but are sufficiently close to them. 

To overcome such difficulties, researchers in various fields of science and engineering 
turned to stochastic algorithms. There are several kinds of arguments for doing so. The 
first, and certainly not the most important one, is increasing availability of the computing 
power. We are able to examine, usually in a fraction of a second, many trial solutions of 
the optimization problem under study. Loosely speaking, this is the base of a rich family 
of Monte-Carlo-type optimization algorithms. Trying to mimic Nature's actions is another 
justification for rich variety of optimization algorithms, just because Nature seems very 
successful. Let's put aside algorithms of genetic type, grown on biological grounds, and 
concentrate instead on those, which are based on behavior of purely physical systems. 
Various physical phenomena were taken into account, mostly from classical mechanics 
of a single particle (deterministic algorithms of "gradient type") and thermodynamics 
(simulated annealing) as models for optimization procedures. 

In this paper we propose the diffusion processes and quantum tunneling as a base for a 
class of stochastic optimization routines. 



II. Diffusive processes as a model for optimization procedure 



Consider the simplest version of familiar Monte-Carlo optimization procedure. Its ope- 
ration may be summarized as a random walk in the search space (bounded or not) and 
sampling the values of objective function in visited points. The trajectory of such a ran- 
dom walker is very similar to the trajectory of physical particle subjected to Brownian 
motion. If we consider many random walkers at the same time (multiple start point 
Monte-Carlo optimization algorithm), then emerging set of trajectories resembles closely 
the diffusion process. In the derivation of the law of Brownian motion one assumes that 
the lengths of individual "jumps" are not equal to each other but are distributed nor- 
mally, as a result of a huge number (estimated as 10^ — 10^) of independent "kicks" from 
surrounding molecules. This is practical manifestation of the law of large numbers, known 
also as the Central Limit Theorem. Under those assumptions it may be shown (Einstein, 
Smoluchowski) that the average distance, R, of a random walker from starting point is 
a function of time, t, and can be expressed as 



where V is the diffusion constant. 

The formula (|l]) was later confirmed experimentally for small particles suspended in 
liquids. This, in turn, made possible to estimate the value of very important physical 
constant, the Avogadro's number, Na, thus finally confirming the atomic structure of 
matter. It is worth noting that the early value of Na, obtained this way, differs less than 
1% from the one known today, almost 100 years later. 

Extensive investigations of diffusion processes revealed, that at least some of them must 
be governed by other mechanisms, different from familiar Brownian motion. They were 
classified as subdiffusion (z/ < 1, see Eq. |l|) and enhanced diffusion [v > 1). It is often said 
that the enhanced diffusion is governed by Levy flights, which will be explained below. 

Levy flights 

Paul Levy (1886 — 1971), the French mathematician, considered in thirties (XX century) 
the following problem |l|: 

What, if any, should be the probabihty density of N independent, identically 
distributed random variables (iid) Xi, X2, . . . , X^r to satisfy the requirement that 
the probability density of their sum Xi + X2 + . . . + X^ has the same functional 
form? 

Today we could say, that Levy tried to find a class of self-similar objects, known as fractals 
since Benoit Mandelbrot had invented them, much later, in 1968. 

Well known answer to Levy's problem was based on famous Central Limit Theorem, 
which, in most widely known version, states that the sum of iid random variables has 
normal probability density. We can even drop the requirement of identical distributions 
(but not the independence!) and still have the same result. There is a catch, however: the 
individual distributions have to be narrow, i.e. their first and second moment (Lindberg), 
and in Lyapunov version also the third moment, have to be finiteQ 

Taking into considerations also other, non-gaussian distributions. Levy obtained the 
following condition for the Fourier transform of probability density of the sum of iid 
random variables: 



random variable dominates others in the sum. Finiteness of third moments is necessary but not enough for that. 



{R\t)) = Vf, with V = 1 



(1) 




where the normahzation constant was dropped, and < (3 < 2. 

Going back to the searched distribution, not its Fourier transform, is not trivial, and 
the analytical form of the result is known only for few special cases. Generally it may be 
expressed as 

L(x) = — exp (— 7^^^) cosqx dq (3) 

TT JO ^ ^ 

and is known as symmetrical Levy stable distribution of index (3 {0 < (3 < 2) and scale 
factor 7 (7 > 0). For simplicity one usually sets 7 = 1. 

The special cases mentioned earlier are: 

• Cauchy distribution (among physicists known also as Lorentzian shape): 

Pn{x) = -L = i- (I) for P = 1, and (4) 

• Gauss normal distribution, when j3 = 2. 

The integral (^) can be written in a form of (truncated) power series 0] 

I ^ {-if V {(5k + I) . fkTT/3\ „ 

Li^) = E fe, ^,.+1 j + Rrr.{x) (5) 

with Rm{x) of order and the leading term is proportional to x~^^^ . Looking at 

the above series and original Levy's result (§), one can see that the searched probability 
density should behave as 

L{x) ~ as |x| 00 (6) 

Now we understand, why the index (3 must belong to the interval ] 0, 2 ]: for /3 < integral 
(0) does not exist (is unbounded), while for (3 >2 ordinary Central Limit Theorem holds. 
It is also clear, that there are some Levy distributions, those with index < /5 < 1, for 
which even the first moment, i.e. expectation value, does not exist (second moment, i.e. 
the variance, is always infinite). This poses a serious problem for physicists, since the 
ordinary procedure of repeated measurements makes no sense in such cases, and if used 
nevertheless - leads to strange, confusing and incorrect results. 

in. Diffusive processes, continued 
Allowing the random walker to make steps of length / distributed^ as 

with appropriate normalization constant C, one can show that the interesting quantity 
(i?2(t)) follows the law 

t^ 0<(3 <1 

tyint p = l 

{R^{t)) ~ { t3-/3 l</3<2 (8) 
tint (3 = 2 
t (3>2 

■^To be precise: we should write 1/Iq here, instead of just I, where Iq denotes unit length, in order to operate with 
dimensionless quantities only. The reason for not doing so is following: we don't want to create the impression 
that any particular length scale is better than others; indeed any unit ranging from femtometers to astronomical 
unit is equally good. That is why the behavior described by power law is often called 'scale free' - no length unit 
is preferred, except for practical reasons. 



assuming that I = vt, and v = const > during every jump. 

The distribution (|^) approximates very well the Levy distribution for large arguments, 
see relation (^. Some authors prefer to use another approximation for Levy distribution: 
P{1) ~ (for large /), inferring then that ~ t^/^. They use P(/) = const 

for small /. We prefer our form, since it never produces infinite densities of probability, 
while retaining desired asymptotic properties. 
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Fig. 1 

Example of Levy flight with index f3 — 1.99 < 2. Here, and in subsequent figures, 

PRESENTED ARE THE TRAJECTORIES CONSISTING OF 500 STRAIGHT SECTIONS, OBTAINED WITH 
POWER LAW DISTRIBUTION OF STEP SIZES APPROXIMATING LEVY DISTRIBUTION, SEE TEXT, AND 
UNIFORM DISTRIBUTION OF DIRECTIONS IN PLANE. THE SEQUENCES OF NUMBERS PRODUCED BY 
UNDERLYING UNIFORM RANDOM NUMBER GENERATOR ARE IDENTICAL IN ALL FIGURES AND THE 

RANDOM WALK ALWAYS STARTS FROM POSITION (0,0). CLEARLY VISIBLE ARE CHARACTERISTIC 
FEATURES OF TRAJECTORIES: THEY ARE CLUSTERS OF CLUSTERS OF CLUSTERS OF . . . 



It is interesting, that in such a vigorous movement as the turbulence, the squared average 
distance from start point, for any particle observed in a coordinate system moving with 
the fluid (Lagrange's coordinates), may be characterized by /? = 5/3. Other physical 
systems described by Levy distribution are mentioned in p|, [|]. Especially, many physical 
quantities in phase transition region behave according to the power law. Among them 
we can find relaxing sand piles, magnetic systems, etc. The lengths of flights made by 
albatrosses are also distributed according to power law. Other examples from everyday life 
include stock market price fluctuations, www network connectivity (number of computers 
connected to the given node), compacting the granular systems and . . . the number of 
goals per soccer game. 



Why Levy distribution may be useful for stochastic optimization? 

In global stochastic optimization we need two essential ingredients, in some sense acting 
against each other. One of them is the routine, which finds efficiently the local extremum 
when the search process happens to be nearby, and the other - a way to escape from local 
extremum, since it may be not the global one. It is common to observe during evolutionary 
optimization the prolonged periods of relatively small improvements followed by sudden, 
rapid transitions to another local extremum. The process may take long time simply 
because the random walkers move and explore the search space too slowly, i.e. they 
make too small and hence too cautious steps. Using step size generated accordingly to 
one of Levy distributions instead of uniform or gaussian distribution should therefore 
be advantageous. The population of random walkers will be always concentrated 
around recently found extremum, as it should in evolutionary algorithms, and in 
the same time always few population members will explore more distant regions of 
search space. This happens with normally distributed walkers only very rarely. The 
ratio between two classes of random walkers may be easily controlled, in a smooth way, 
by appropriate choice of index /?. This is illustrated in Figs. |l] — |^. 
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Fig. 2 

Levy flight with index /3 = 1.67 « 5/3 (turbulent case, upper curve shifted 400 units up) 

AND f3 ~ 1.50 (lower curve). 



True Levy distribution is hard to implement in computer code, but the approximate 
form, like the one given by Eq. (|^), is easy (see the next section). The optimal choice 
of index (3 may be problem-dependent, but should not be critical. Further investigations 
(experiments?) are necessary to address this question. We suppose, that this choice should 
be concentrated mainly in the range ]0, 1], as largely unexplored until now. Values 
/? > 1 result in slower spreading of random walkers in search space, what may very 
significantly affect their ability to find desired extremum, when the problem is defined 



on unbounded domain. On the other hand, the case < f3 < 1 may be considered 
as a computer imitation of the phenomenon known from quantum mechanics, namely 
quantum tunneling, see Fig. ^. It is interesting to note, that we have obtained this 
behavior without even mentioning the quantum mechanical methods. 
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Fig. 3 

Levy flight with index /3 — 0.67 < 1. Note the dramatic scale change 1000 x) 
comparing to previous figures. contrary to earlier presented cases, the average 
position does not exist, so this walker has capability to explore unbounded domains. 
Recall that this trajectory also consists of 500 sections. 



One may wonder, whether such an optimization procedure can still be classified as an 
evolutionary algorithm. Our answer is yes, because one can always identify long jumps 
in search space with mutations, but remember that some researchers are simply ruling 
out mutations from algorithms regarded as evolutionary, reserving them instead for — as 
they call it — genetic-type routines. 

We have to mention here, that similar behavior, i.e. occasional long jumps, was intro- 
duced earlier, rather heuristically, by Galar ^ and Kopciuch under the amusing name of 
impatience operator and interpreted there mainly in context of social sciences. 

IV. Example FORTRAN 77 procedure for generating Levy flights 

Here we present the random number generator, implemented in FORTRAN 77, which 
produces sequences of numbers distributed according to Eq. (|^). It is intentionally not 
optimized and works by inverting the distributive function, which is given by 

V-\0 = l/{l-e^^-l, (9) 

where ^ is uniformly distributed on [0, 1]. The above form may be safely simplified to 

v-\o = C^'^ - 1 (10) 



by replacement 1 — ^ ^ 



DOUBLE PRECISION FUNCTION LEVYl (X, BETA) 
DOUBLE PRECISION X, BETA, R, RANF 
R = RANF(X) 

LEVYl = l.DO/R**(l. DO/BETA) - l.DO 

RETURN 

END 

RANF is the name of any available standard, i.e. uniformly distributed on ] 0, 1 ], random 
number generator taking X (of type DOUBLE PRECISION) as a dummy argument. 

There is no check, whether /3 G [0, 2[. The subroutine will work even for /3 > 2, however 
one should not expect to obtain normally distributed random numbers in such case. 

V. Summary and discussion 

In this paper we have described two distinct, but closely related classes of stochastic 
optimization algorithms, based on a single mathematical model and being the computer 
counterparts of two different physical effects: classical diffusion and quantum tunneling. 
They received unified mathematical background and may be distinguished according to 
the properties of random walkers, as summarized below: 



properties 


Levy index 


physical effect 


no moment exists 


< < 1 


quantum tunneling 


only first moment exists 


1< ^ < 2 


superdiffusion, including turbulence 


gaussian distribution 


(3 = 2 


diffusion (Brownian motion) 


unknown with a'^ < oo 


I3>2 


sub diffusion 




or not applicable 





The unexpected similarities between classical diffusion processes and quantum tunneling 
have their roots probably in properties (similarities) of the corresponding partial differ- 
ential equations describing them. Both equations, i.e. the diffusion equation (Pick's law) 
and Schrodinger equation relate first partial derivatives of the unknown function with 
respect to time with its second spatial derivatives. The important difference is the ex- 
plicit presence of imaginary unit, i, in Schrodinger's equation. The algorithm we describe 
here is able to mimic the properties of both types of solutions. It can be easily switched 
from one type of behavior to another one by merely changing the value of a single control 
parameter, i.e. Levy index. 

The question arises, whether the familiar evolutionary algorithms should be immediately 
thrown away in favor of Levy flights based ones. Even, if we stick to the orthodox 
definition of evolutionary algorithms as the ones, which accept only small, gradual changes 
in position within the searched domain - then the answer is no. The main problem 
with evolutionary algorithms is their poor ability to escape from unwanted, suboptimal 
extrema. Indeed, if evolutionary random walkers make steps with lengths distributed 
uniformly on [O,/^^^;], then they are unable to escape from extremum, which is wider 
than ~ 2l^ax- Using normally distributed numbers as step lengths - good in theory - 
doesn't help much in practice. And here is why: gaussian generators of poor quality never 
produce random numbers exceeding few (say ~ 3) o" in magnitude. On the other hand, 
even perfect normal generators produce very rarely steps longer than that. So it is a pure 
illusion, that it is possible to find the global optimum in reasonable time with one of such 
algorithms - even if the appropriate theorem states so. This may only happen, when 
our first approximation to the solution is already quite good. Quite different situation 



occurs, when our goal is to keep track and adapt to the varying environment, i.e. when 
objective function changes smoothly and slowly enough while the optimization procedure 
is in progress (for example satellite or missile tracking). In such cases, providing the 
starting point is already well known - i.e. is located closer to the global optimum than 
to any other one - the evolutionary algorithm, without any mutations, is indispensable. 
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